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ABSTRACT 


The causative agent of severe acute respiratory 
syndrome (SARS) is a_ previously unidentified 
coronavirus, SARS-CoV. The RNA-dependent RNA 
polymerase (RdRp) of SARS-CoV plays a pivotal 
role in viral replication and is a potential target for 
anti-SARS therapy. There is a lack of structural or 
biochemical data on any coronavirus polymerase. 
To provide insights into the structure and function 
of SARS-CoV RdRp, we have located its conserved 
motifs that are shared by all RdRps, and built a 
three-dimensional model of the catalytic domain. 
The structural model permits us to discuss the 
potential functional roles of the conserved motifs 
and residues in replication and their potential inter- 
actions with inhibitors of related enzymes. We 
predict important structural attributes of potential 
anti-SARS-CoV RdRp nucleotide analog inhibitors: 
hydrogen-bonding capability for the 2’ and 3’ 
groups of the sugar ring and C3’ endo sugar pucker- 
ing, and the absence of a hydrophobic binding 
pocket for non-nucleoside analog inhibitors similar 
to those observed in hepatitis C virus RdRp and 
human immunodeficiency virus type 1 reverse tran- 
scriptase. We propose that the clinically observed 
resistance of SARS to ribavirin is probably due to 
perturbation of the conserved motif A that controls 
rNTP binding and fidelity of polymerization. Our 
results suggest that designing anti-SARS therapies 
can benefit from successful experiences in design 
of other antiviral drugs. This work should also pro- 
vide guidance for future biochemical experiments. 


PDB code 105S 


INTRODUCTION 


Severe acute respiratory syndrome (SARS) is a new viral 
disease that has spread to 32 countries and has resulted in more 
than 800 deaths from respiratory distress syndrome (1-3). The 
causative agent of SARS is a previously unidentified 
coronavirus, SARS-CoV (4-6), which is closely related to 
group II coronaviruses that include human virus OC43 and 
mouse hepatitis virus (7). Treatment of SARS with antiviral 
agents such as ribavirin and corticosteroids has not achieved 
satisfactory results (8). Furthermore, there is not yet a vaccine 
available for protection against SARS. 

Coronaviruses are a group of enveloped positive strand 
RNA viruses. The viral genome of SARS-CoV is a single- 
stranded RNA of 29 727 nucleotides (9-11). By analogy with 
other coronaviruses, SARS-CoV gene expression is predicted 
to involve complex transcriptional and translational events 
(12). The 5’ two-thirds of the genome encode the replicase 
gene (~21 kb) that is expressed by two very large open reading 
frames (ORFs), la and 1b. Expression of SARS-CoV proteins 
is expected to start with translation of two polyproteins, ppla 
and pplab, with predicted lengths of 4328 and 7023 amino 
acids, respectively. pplab is the result of a translational 
frameshifting event at the end of ORFla. These polyproteins 
undergo co-translational proteolytic processing into at least 
four key enzymes: an RNA-dependent RNA polymerase 
(RdRp), a picornavirus 3C-like proteinase, a papain-like 
proteinase and a helicase. 

SARS-CoV RdRp is the essential enzyme in a replicase 
complex that is expected to contain additional viral and 
cellular proteins. The replicase complex is primarily used to 
transcribe: (i) full-length negative and positive strand RNAs; 
(ii) a 3’-co-terminal set of nested subgenomic mRNAs that 
have acommon 5’ ‘leader’ sequence derived from the 5’ end of 
the genome; and (iii) subgenomic negative strand RNAs with 
common 5’ ends and leader complementary sequences at their 
3’ ends (11,12). 
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Sequence comparisons and mutagenesis studies of RdRps 
from a wide range of RNA viruses have identified several 
conserved sequence motifs that are important for biological 
functions (13-19). Four of these conserved motifs exist in all 
polymerases (apart from polymerase B and multisubunit 
DNA-dependent RNA polymerases) and reside in their 
catalytic domain. Crystal structures of RdRps from five 
different RNA viruses have also been reported, including 
poliovirus (PV) (20), hepatitis C virus (HCV) (21-24), rabbit 
hemorrhagic disease virus (RHDV) (25), reovirus (RV) (26) 
and bacteriophage 6 (6) (27). Those studies have revealed 
key aspects of the structural biology of RdRps and confirmed 
the hypothesis that RdRps share a common architecture and 
mechanism of polymerase catalysis (13). 

Given the crucial role of RdRp in the virus life cycle and the 
success obtained with polymerase inhibitors in the treatment 
of viral infections, including human immunodeficiency virus 
type 1 (HIV-1), human hepatitis B virus (HBV), HCV and 
herpes virus, SARS-CoV RdRp is an attractive target for 
development of anti-SARS drugs. Yet there are no 
structural and very limited biochemical data on coronavirus 
polymerases. 

To understand the structural basis of SARS-CoV RdRp 
enzymatic activity and potential drug susceptibility, we 
compared the sequence of SARS-CoV polymerase with 
those of PV, HCV, RHDV, RV, 06 and HIV-1 polymerases 
whose crystal structures are known. Based on sequence 
comparisons, we have located the conserved sequence motifs 
that are shared in all RdRps and built a three-dimensional 
model of the catalytic domain. We also describe the potential 
roles of specific residues in the polymerization mechanism and 
in recognition of potential inhibitors. Structural analysis of 
SARS-CoV RdRp is likely to aid the development of anti- 
SARS agents and provide guidance in the design of future 
biochemical experiments. 


MATERIALS AND METHODS 
Sequence alignments 


The sequence of SARS-CoV RdRp (932 amino acid residues; 
strain CUHK; NCBI accession no. AAP13566) was aligned 
with that of representatives of the other three groups of 
coronaviruses, and of five viral RdRps whose crystal struc- 
tures are known. The coronaviruses used in sequence align- 
ment are: group I, human coronavirus 229E (HcoV-229E; 
NCBI accession no. NC_002645); group I, murine hepatitis 
virus (MHV; NCBI accession no. NC_001846); and group II, 
avian infectious bronchitis virus (AIBV; NCBI accession no. 
NC_001451). The five viral RdRps are: PV, RHDV, HCV, RV 


and 06 polymerases. HIV-1 reverse transcriptase (RT), which 
is both an RNA-dependent and DNA-dependent DNA 
polymerase, is also included in the sequence comparison 
because of the availability of abundant structural and 
functional data. 

Sequence alignments of SARS-CoV RdRp with other 
coronavirus RdRps were obtained using program 
CLUSTAL-W (28). The SARS-CoV RdRp shares very high 
amino acid sequence identity with other coronavirus RdRps, 
but has very low sequence similarity to other viral RdRps and 
RTs of known structures. Therefore, any conventional method 
of sequence alignment is not helpful. We carried out the 
sequence comparison of SARS-CoV RdRp with other viral 
RdRps and HIV-1 RT of known crystal structures primarily 
based on manual alignments. The crystal structures of HCV, 
PV, RHDV, RV, 66 and HIV-1 polymerases were used as 
guides in the sequence alignments to identify the conserved 
sequence motifs of SARS-CoV RdRp. 

Comparison of the crystal structures of HCV, PV, RHDV, 
RV, 66 and HIV-1 polymerases allowed us to align both the 
structures and primary sequences and identify the consensus 
sequences of the conserved motifs that are shared in all RdRps 
and RTs (motifs A-G). These consensus sequences were used 
as reference points to locate the conserved motifs in SARS- 
CoV RdRp. Comparison of the sequences of SARS-CoV and 
other coronarivus RdRps with those of HCV, PV, RHDV, RV, 
6 and HIV-1 polymerases allowed us to initially identify 
motifs A, B and C in SARS-CoV RdRp. For motif A, we 
searched for two strictly conserved aspartates separated by 
four residues. Motif B was expected to contain a strictly 
conserved ‘XSG’ sequence followed by a conserved threonine 
and a conserved asparagine in a long o-helix. For motif C, we 
searched for a conserved “XDD’ sequence. There are three 
‘XDD’ sequences in SARS-CoV RdRp. We chose the first 
“XDD” sequence as motif C because: (i) it is strictly conserved 
in all coronavirus RdRps (‘SDD’ in all); (i) it is located 
between two predicted B-strands; and (iii) there are an 
appropriate number of residues at the C-terminus to accom- 
modate conserved motifs D and E and the thumb subdomain. 
The locations of these three motifs facilitated the identification 
of other conserved motifs. The consensus ‘SXG’ sequence of 
motif G was identified based on the sequence alignment of 
SARS-CoV RdRp with PV and RHDV RdRps. Motif F was 
located by identifying several conserved positively charged, 
basic residues (K or R) based on the sequence alignment of 
SARS-CoV RdRp with other viral RdRps in the region 
between motifs G and A. Based on structural comparison of all 
viral RdRps and RTs of known structures, motif D appears to 
contain a hydrophilic residue (R/K/E/Q) in the middle of an 


Figure 1. Sequence alignment of the RdRp of SARS-CoV with those of representatives of the other three classes of coronaviruses and of five RNA viruses 
with known crystal structures. The representative coronaviruses are: group I, human coronavirus 229E (HcoV-229E; NCBI accession no. NC_002645); group 
II, murine hepatitis virus (MHV; NCBI accession no. NC_001846); and group III, avian infectious bronchitis virus (AIBV; NCBI accession no. NC_001451). 
The five RNA viruses are poliovirus | strain Mahoney (PV; PDB code 1RDR), rabbit hemorrhagic disease virus (RHDV; PDB code I1KHV), hepatitis C virus 
(HCV; PDB code 1QUV), reovirus (RV; PDB codes 1N35 and 1N1H) and bacteriophage 06 (Phi6; PDB codes 1HIO and 1HI1). HIV-1 RT (HIV-1; PDB 
code IRTD), a widely studied RNA-dependent and DNA-dependent polymerase, is also included in the comparison. The sequence in the palm subdomain and 
regions containing the conserved motifs (highlighted with green bars) can be aligned confidently among different viral RdRps and HIV-1 RT. However, the 
sequence in the fingers and thumb subdomains is less conserved between SARS-CoV RdRp and other viral RdRps, and the structure in those subdomains also 
varies substantially among the known RdRp structures. Thus, the sequence alignment and the structural model in these regions are less reliable. Invariant 
residues are highlighted in a shaded red box, and conserved residues are in red. The secondary structures of RHDV, HCV, RV and 06 polymerases extracted 
from the corresponding structures and the predicted secondary structure of SARS-CoV RdRp are shown above the sequence alignment. a-Helices are shown 


as spirals and B-strands as arrows. The alignment was drawn with ESPript (66). 
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a-helix, a polar residue (R/D/E/Y) at the C-terminus of the _ predicted a-helix after motif C in SARS-CoV RdRp with that 
a-helix, and an aromatic residue (F/Y/W) in the following — in other viral RdRp structures, as well as the positions of a 
turn. We assigned motif D based on the alignment of a hydrophilic residue (K) and a residue containing a polar group 
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(Y) in the C-terminus of the predicted o-helix and an aromatic residue and the sequence similarity (XCS) with HCV RdRp at 
residue (F/Y/W) in the following turn. Finally, motif E was the turn of the conserved hairpin structure following motif D. 
recognized based on the position of a conserved aromatic Subsequently, these conserved motifs were used as landmarks 
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to guide further sequence and secondary structural element 
alignments. In less conserved regions or regions containing 
insertions or deletions, we adjusted the alignments manually 
by taking account of appropriate alignment of the predicted 
secondary structure of SARS-CoV RdRp with the secondary 
structures of other viral RdRps, and the properties of amino 
acids (hydrophobic or hydrophilic character). 

The secondary structure prediction of SARS-CoV RdRp 
was performed using the program PHD (29,30). This 
program uses cascading neural network algorithms that have 
been trained on several hundred non-homologous protein 
structures. The accuracy of PHD has been reported to be on 
average >70% as judged by the Q; index for globular proteins. 
Q; and SOV are two widely used accuracy indices to evaluate 
algorithms of protein secondary structure prediction (31,32). 
To evaluate the reliability and accuracy of the PHD program, 
we carried out positive controls. We applied the program to 
the sequences of RHDV, HCV, RV and 6 polymerases whose 
structures are known and were used in our study for structure 
comparisons (PV polymerase was not used in the test because 
its structure is less complete). The estimated accuracy of the 
predictions was judged by comparing the predicted secondary 
structures with the actual secondary structures extracted from 
the published structural results. For all four proteins, the 
overall Q3 index was 70-75% and the overall SOV index was 
67—72.5% (data not shown). The accuracy of prediction for 
a-helices was higher than the average values (the Q3 index of 
71-82% and the SOV index of 69-87.5%, respectively). The 
accuracy of prediction for B-strands was relatively low (the Q3 
index of 43-62% and the SOV index of 44-65%, respect- 
ively). The accuracy of prediction for conserved motifs 
depends on their specific secondary structure: motifs consist- 
ing of a-helices were predicted with high reliability; motifs 
forming B-strands and random coils less accurately. Therefore, 
PHD was used in the secondary structure prediction of SARS- 
CoV RdRp, and the resulting secondary structure prediction 
should be considered as reasonably reliable, especially for the 
helical regions. 


Homology modeling 


Initial models of SARS-CoV RdRp were obtained using the 
MODELLER program (33) that generates three-dimensional 
structures based on amino acid sequence alignments of a 
molecule with one or more template structures. We used as 
template structures in the homology modeling the structures of 
HCV (PDB code 1QUV), PV (PDB code 1RDR), RHDV 
(PDB code 1KHV), RV (PDB codes 1N35 and 1N1H), 06 
(PDB codes 1HIO and 1HI1) and HIV-1 (PDB codes IRTD 
and 2HMI) polymerases. The scaffold of SARS-CoV RdRp 
was based on the crystal structure of RHDV polymerase. 
Other models derived from the crystal structures of HCV, PV, 
RV and 6 polymerases were used as additional guides in 
building the molecular model of SARS-CoV RdRp. We built 
manually the less conserved regions and regions containing 
insertions and deletions using the graphics program O (34), 
consulting reference databases of known main chain and side 
chain conformations and preferred side chain rotamers. Buried 
side chains were manually adjusted to avoid steric conflict or 
to have favorable interactions with neighboring residues. 
The region of SARS-CoV RdRp residues 712-751 has no 
equivalent in other RdRps of known structure. Therefore, this 
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region was not built in the current model. The N-terminal 
region (residues 376-388) and the C-terminal region (residues 
891-932) are also omitted in the model because of the lack of 
consensus template structures for these regions. The final 
model was energy minimized using the molecular dynamics 
simulation procedure in program MODELLER. The quality 
and stereochemistry of the model were evaluated using the 
program PROCHECK (35). The main chain conformations for 
99.3% of amino acid residues were within the favored or 
allowed regions of the Ramachandran plot, and the overall G 
factor was —0.23, indicating that the molecular geometry of the 
model is of good quality. Secondary structure assignments for 
the final model agree well with the secondary structure 
predicted from the sequence using the PHD program. 

Models of the RNA-RNA template—primer and rNTP were 
built based on the structures of RV polymerase (26), 6 
polymerase (27) and HIV-1 RT (36,37) in their complexes 
with nucleic acid and NTP or dNTP substrates. The corres- 
ponding structures of these complexes were superimposed 
onto the structural model of SARS-CoV RdRp based on 
structural alignment of the palm subdomains. An A-form RNA 
template—primer duplex could be docked into the nucleic acid- 
binding cleft with only minor steric conflicts with structural 
elements of the protein. 


RESULTS AND DISCUSSION 


Sequence comparisons 


SARS-CoV RdRp is predicted to contain 932 amino acids. The 
N-terminal portion of SARS-CoV and other coronavirus RdRps 
is large and has no counterpart in other positive strand RNA 
virus RdRps of known structures. Thus, we consider this 
N-terminal portion of SARS-CoV RdRp (residues 1-375) as an 
N-terminal domain (NTD) and the C-terminal portion (residues 
376-932) that is equivalent to other polymerases as the 
polymerase catalytic domain. Sequence alignments and com- 
parisons indicate that SARS-CoV RdRp has a high sequence 
identity with other coronavirus RdRps (~62—73%). However, it 
shares <10% sequence identity with other viral RdRps and RTs, 
including PV, HCV, RHDV, RV and 06 RdRps and HIV-1 RT 
whose structures are known (Fig. 1). Normally, such a low level 
of homology would not permit reliable sequence alignment and 
homology modeling. However, we applied a stepwise protocol 
that relied on manual identification of key conserved motifs and 
used them as landmarks to guide subsequent alignment of 
primary sequence. Crucial for the sequence alignments were 
also the prediction of the secondary structure of SARS-CoV 
RdRp and the appropriate alignment of the predicted secondary 
structure elements of SARS-CoV RdRp with the secondary 
structures of PV, HCV, RHDV, RV and 6 RdRps (as observed 
in the corresponding crystal structures of these enzymes) 
(Fig. 1). Prediction tests on the secondary structures of RHDV, 
HCV, RV and $6 RdRps using the PHD program give overall 
accuracy indices Q3 >70% and SOV >67%. Nevertheless, it 
should be noted that the sequence alignment in the fingers and 
thumb subdomains is less reliable due to the low sequence 
similarity between SARS-CoV RdRp and other viral RdRps, 
and large structural variations among the known RdRp 
structures. 
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Figure 2. Ribbon diagram of the homology model of SARS-CoV RdRp with a docked RNA template—primer. o-Helices are shown as spirals and B-strands as 
arrows. The subdomains of the catalytic domain are colored as the N-terminal portion of the fingers subdomain (376-424) in magenta, the base of the fingers 
(residues 425-584 and 626-679) in blue, palm (residues 585-625 and 680-807) in red, and thumb (residues 808-932) in green. 
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Figure 3. Stereoview of the polymerase active site and the rNTP-binding site. The conserved sequence motifs (A—G) are highlighted. A docked rNTP sub- 
strate is shown as a ball-and-stick model. The catalytic active site is defined by the three conserved aspartates, Asp618, Asp760 and Asp761 (shown with side 
chains) that are coordinated with two divalent metal ions (shown as magenta spheres). 


Structural model of SARS-CoV RdRp 


After identification of the conserved sequence motifs and 
establishment of reliable sequence alignments, we built a 
three-dimensional homology model of the catalytic domain of 
SARS-CoV RdRp based on the crystal structures of HCV, PV, 
RHDV, RV, 6 and HIV-1 polymerases. By analogy with 
other polymerases, the catalytic domain of SARS-CoV RdRp 
consists of fingers, palm and thumb subdomains that form an 
encircled nucleic acid-binding tunnel (Fig. 2). Analysis of the 
structural model permits us to discuss the potential functional 
roles of the conserved motifs and specific residues in 
polymerization (Fig. 3 and Table 1). 


N-terminal domain 


SARS-CoV and other coronavirus RdRps contain an NTD 
(approximately residues 1-375 in SARS-CoV RdRp) that is 
expected to form at least one protein domain. There is no 
equivalent structural domain in other positive strand RNA 
virus RdRps with known structures. The double-stranded 
RNA RV RdRp contains an NTD that is comparable in size 
with that of SARS-CoV RdRp. However, the two NTDs share 
a very low sequence similarity (<12% identity) and contain no 
conserved sequence motif, making it difficult to perform a 
reliable sequence alignment and build a meaningful homology 
model of the NTD of SARS-CoV RdRp. The functional 
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Motif Sequence Possible functions References 

A 612 PHLMGWDYPKCDRAM Asp618: metal ion chelation (24,26,27,36,40) 
Asp623: recognition of rNTP sugar ring (24,26,27) 

B 678 GGTSSGDATTAYANSVENICQAVTANVNALLST Ser682 and Thr687: recognition of template—primer (21,25-27) 
Ser682, Thr687 and Asn691: help sugar selection of rNTP (20,21,23-26) 

Cc 753 FSMMILSDDAVVCYN Asp760 and Asp761: metal ion chelation (24,26,27,36,40) 
Ser759: binding of 3’-primer terminus or priming nucleotide (36,37) 

D 771 AAQGLVASIKNFKA VLYYQNNVFMSE Stabilize the core structure; may also help position Asp618 

E 810 HEFCSQHTMLV Control the flexibility of the thumb (36,37,45) 
Cys813 and Ser814: positioning of priming nucleotide (26,27,37,45) 

F 544 LKYAISAKNRARTVAGV Lys545, Lys551 and Arg553: rNTP binding and positioning (24,26,27,36) 
of template overhang 

G 499 DKSAGFPFNKWGK Positioning of template overhang (26,27) 


The conserved motifs of SARS-CoV RdRp are assigned based on manual sequence alignments and structural comparisons with other viral RdRps of known 
structures. The highlighted residues are highly conserved in most viral RdRps. The potential functions of these motifs and specific residues are proposed for 
SARS-CoV RdRp based on comparisons with other RdRps whose structures and functions are known. 


implications of this domain are unclear. The NTD of RV 
polymerase bridges the fingers and thumb subdomains on one 
side of the catalytic cleft and participates in the formation of a 
channel through which the incoming nucleotide is likely to 
diffuse into the active site during polymerization (26). It is 
plausible that some of the coronavirus-specific replicase and 
transcription activities map to this domain. For example, this 
domain may be involved in interactions with the leader or 
intragenic sequences during transcription of the characteristic 
nested mRNAs of coronaviruses and/or in protein-protein 
interactions with the viral helicase or other viral and/or host 
proteins involved in coronavirus replication. 


Fingers subdomain 


The sequence and structure of the fingers subdomain are less 
conserved than those of the palm subdomain among different 
viral RdRps. In all known RdRp structures, the fingers 
subdomain is composed of two polypeptide segments, an 
N-terminal segment and a segment spanning motifs A and B of 
the palm subdomain. The ‘base’ of the fingers is mainly 
a-helical and the ‘tip’ of the fingers consists primarily of 
B-strands and random coils. The SARS-CoV RdRp fingers 
subdomain spans approximately from residues 376 to 584 and 
626 to 679 and is also predicted to consist of a-helices in the 
base and B-strands and coils in the tip (Figs 1 and 2). Despite 
the high sequence variability among the fingers subdomains of 
different viral RdRps, there are two conserved sequence 
motifs (F and G) shared by all RdRps that play important 
functional roles in the mechanism of polymerization (Figs 1| 
and 3). 

Similarly to HCV and RHDV RdRps, the fingers subdomain 
of SARS-CoV RdRp contains an N-terminal portion (residues 
405-444) that forms a long loop emanating from the fingertip 
that bridges the fingers and thumb subdomains (Fig. 2). In the 
PV RdRp structure, the equivalent region is disordered, and in 
the 6 RdRp structure it has a different structural fold but also 
bridges the fingers and thumb subdomains (27). In RV 
polymerase, however, the bridging of the two subdomains is 
accomplished by the NTD (26). As a result of these 
interactions, all RdRps form an encircled nucleic acid-binding 
‘tunnel’ that can accommodate binding and translocation of a 


nucleic acid without major conformational changes of the 
enzymes. This is different from HIV-1 RT and other DNA 
polymerases that form a U-shaped DNA-binding cleft due to 
the lack of the fingers-thumb subdomain interaction and 
require large-scale subdomain movements to accommodate 
the template—primer and dNTP substrates. The interaction of 
these subdomains is believed to ensure coordinated movement 
and help modulate initiation, elongation and termination of 
RNA synthesis by contributing to high processivity of viral 
replication (26,27). The N-terminal region of the fingers 
subdomain is also suggested to be involved in recognition of 
nucleotide substrate, protein-protein interactions and 
oligomerization of the polymerase (20,38,39). 


Motif F. Motif F contains several conserved positively 
charged, basic residues (K or R) and has been proposed to 
consist of three submotifs, Fl, F2 and F3 (19). Submotif F2 
does not appear to be present in SARS-CoV, RV and HIV-1 
polymerases (Fig. 1). Motif F forms part of a ‘B-strand, loop 
and B-strand’ structure that, similarly to the N-terminal loop 
(see above), also extends from the fingers to interact with the 
thumb. The size of the loop, however, varies in different 
polymerases. In 6 polymerase, motif F is ~60 residues 
longer than in other polymerases because of two insertions 
between submotifs Fl and F2 (15 residues), and submotifs F2 
and F3 (40 residues), respectively (19). In SARS-CoV RdRp, 
motif F contains several highly conserved basic residues, 
including Lys545, corresponding to submotif Fl, and Lys551 
and Arg553, corresponding to submotif F3 (Figs 1 and 3; 
Table 1). 

In HIV-1 RT, the structural element containing motif F 
rotates inwards towards the polymerase active site upon 
binding of dNTP, allowing the three conserved residues 
(Lys65, Lys70 and Arg72) to interact with the triphosphate of 
the incoming dNTP (36). In HCV, RHDV, RV and 06 
RdRps, this structural element adopts a closed conformation 
and has (or is proposed to have) no major conformational 
change upon rNTP binding (22—24,26,27). Though the three 
conserved basic residues are separated by a varying size of 
residues in the primary sequence, they are structurally close to 
each other and interact with the incoming rNTP and the 
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template overhang. In the structural model of SARS-CoV 
RdRp, residues of motif F are also predicted to form part of the 
rNTP-binding pocket and help position the template overhang 
(Fig. 3 and Table 1). 


Motif G. Motif G consists of a conserved SXGXP sequence 
possibly followed by a conserved basic residue in many 
RdRps (18). The same motif can be found in SARS-CoV 
(corresponding to Ser501, Gly503, Pro505 and Lys511), PV 
and RHDV RdkRps (Fig. 1). These residues are less conserved 
in HCV, RV and 66 polymerases and do not exist in HIV-1 
RT. The segment containing motif G forms a ‘loop and 
o.-helix’ in most RdRp structures, except RV RdRp which has 
a 16-residue insertion between the loop and the o-helix (Figs 1 
and 3). In the structures of RV and $6 polymerases, residues of 
motif G contact the nucleic acid at its 5’ template overhang 
and form part of the channel for the template strand (26,27). In 
the structural model of SARS-CoV RdRp, residues of motif G 
are also predicted to be involved in positioning of the 5’ 
template strand (Fig. 3 and Table 1). 


Palm subdomain 


The palm subdomain of SARS-CoV RdRp (residues 585-625 
and 680-807) forms the catalytic core of polymerase and 
contains the four highly conserved sequence motifs (A—D) 
found in all polymerases and a fifth motif (E) unique to RdRps 
and RTs (14). The core structure of the palm subdomain is 
well conserved across all classes of polymerases and is 
primarily comprised of a central three-stranded {-sheet 
flanked by two a-helices on one side and a B-sheet and an 
a.-helix on the other (Figs 2 and 3). Residues forming the 
catalytic active site are found within motifs A and C. 


Motif A. As in all viral RdRps, motif A of SARS-CoV RdRp 
contains two highly conserved aspartic acid residues separated 
by four residues (Asp618 and Asp623) (Fig. 1 and Table 1). 
Motif A is composed of a ‘f-strand and short o-helix’ 
structure. The B-strand of motif A, together with the B-strands 
formed by motif C, forms the central B-sheet (Fig. 3). The first 
aspartate (Asp618) is located near the end of the B-strand and, 
together with the two strictly conserved aspartates in motif C 
(Asp760 and Asp761), forms the catalytic center of SARS- 
CoV RdRp. Structural studies of other polymerases indicate 
that the corresponding three aspartates are involved in binding 
divalent metal ions required for catalysis (24,26,27,36,40). 
Similar to other polymerases, mutation of any of those 
aspartates in SARS-CoV RdRps is expected to abrogate 
polymerase activity. The second aspartate (Asp623) is located 
in the short o-helix. In the structures of HCV, RV and 66 
polymerases, the corresponding residues (Asp225, Asp590 
and Asp329, respectively) form a hydrogen bond with the 
2’-OH group of the incoming NTP and appear to be involved 
in sugar selection (24,26,27). The same interaction has 
been proposed in PV polymerase (20,22,23,41). The equiva- 
lent residues in HIV-1 RT and MMLV RT are Tyr1l15 and 
Phe155, respectively. These bulky hydrophobic residues form 
a steric gate that prevents binding of rNTPs because of their 2’- 
OH group (36,37,42,43). Asp623 of motif A in SARS-CoV 
RdRp is expected to also be involved in sugar selection 
(Table 1). 


Motif B. Motif B of SARS-CoV RdRp forms a ‘loop and 
a-helix’ structure and contains several highly conserved 
residues (Ser682, Gly683, Thr687 and Asn691) that appear to 
participate in recognition of the correct nucleic acid and 
selection of the correct substrate (Figs | and 3). As in other 
RdRp structures, the N-terminal loop of motif B contains three 
conserved residues (Ser682, Gly683 and Thr687) that appear 
to interact with the nucleotide that base-pairs with the 
incoming rNTP (21,25-27). The equivalent residues in 
HIV-1 RT (Gln151, Gly152 and Ser156) are also involved 
in positioning the template nucleotide that base-pairs with the 
incoming dNTP (36,37,42). The a-helical part of motif B, 
together with an o-helix formed by motif D, packs beneath the 
central B-sheet (Fig. 3). The conserved asparagine on this 
a.-helix (corresponding to Asn691 in SARS-CoV, Asn291 in 
HCV, Asn317 in RHDV, Asn297 in PV and His691 in RV, 
respectively; @6 RdRp has a glycine at this position) is 
proposed to contribute to the specificity of RdRp for rNTP 
versus dNTPs via a hydrogen-bonding interaction with the 
second conserved aspartate of motif A which in turn 
hydrogen-bonds to the 2’-OH of rNTP (20,21,23—26). The 
equivalent residue in HIV-1 RT (Phe160) forms hydrophobic 
interactions with the side chain of Tyr115 of motif A, which in 
turn determines the substrate specificity of RT by preventing 
binding of rNTPs through steric conflicts with their 2’-OH. In 
SARS-CoV RdRp, Asn691 of motif B appears to interact with 
Asp623 of motif A through a hydrogen bond and, by analogy, 
is likely to have similar function (Table 1). 


Motif C. SARS-CoV and other coronavirus RdRps contain the 
highly conserved XSDD motif C (Leu758—Ser759—Asp760 
Asp761) at the polymerase active site. This motif forms a 
‘B-strand, turn and B-strand’ hairpin structure in all types of 
polymerases; the two conserved aspartates are located at the 
turn (Fig. 3). The first two residues of motif C show some 
degree of variation. The first position has an invariant leucine 
in all coronavirus RdRps which has no apparent functional 
role in the molecular model of SARS-CoV RdRp. This 
position is occupied by a tyrosine in PV (Tyr326), RHDV 
(Tyr352) and HIV-1 (Tyr183) polymerases. In the HIV-1 RT 
structure, the phenoxyl group of Tyr183 forms hydrogen 
bonds with the nucleotide bases of both template and primer 
strands and is suggested to be involved in positioning the 
template—primer (37). The equivalent residue (Gln732) in RV 
polymerase does not interact with nucleic acid. Instead, it has 
contacts with the sugar ring of the NTP substrate (26). The 
corresponding residue (Lys451) in 06 polymerase has no 
contacts with either nucleic acid or rNTP substrate (27). 

The second position of motif C has a serine in all 
coronavirus and 6 RdRps, but a glycine in PV, HCV, 
RHDV and RV RdRps (Fig. 1). Other residues have been 
observed at this position in RTs and DNA polymerases (44). In 
the molecular model of SARS-CoV RdRp, Ser759 appears to 
help position the 3’-primer terminus and/or priming nucleotide 
(Table 1). The equivalent residue in 6 polymerase (Ser452) 
forms a hydrogen bond with the 3’-OH of the priming 
nucleotide (27). The corresponding residue in HIV-1 RT 
(Met184) also helps position the 3’ end of the primer strand 
and the incoming dNTP through hydrophobic interactions 
with the deoxyribose ring (36,37). 
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Together with Asp618 of motif A, the two conserved 
aspartates of motif C (Asp760 and Asp761) form the 
polymerase active site of SARS-CoV RdRp. The first aspartate 
(Asp760) is strictly conserved in all polymerases and is 
coordinated with the metal ions during catalysis 
(24,26,27,36,40). The corresponding residue in HIV-1 RT 
(Asp185) can also form a hydrogen bond with the primer 
terminal 3’-OH, suggesting that it might activate the 3’-OH of 
the primer strand for nucleophilic attack on the incoming 
dNTP o-phosphate (36,37,42). The second aspartate (Asp761) 
is strictly conserved in all RdRps and RTs, but can be replaced 
by glutamate in several DNA polymerases (44). This residue 
does not interact with metal ions in most RdRp and RT 
structures, except in the structures of HCV and RV RdRps 
(24,26). It may help position the side chains of the other two 
aspartates and the 3’-primer terminus by interacting with the 
3’-terminal phosphate of the primer strand (36,37). 


Motif D. Although the primary sequence of motif D is not well 
conserved, this motif always forms an ‘o-helix, turn and short 
B-strand’ in all known RdRp and RT structures except 6 
RdRp which contains a seven-residue insertion between the 
a-helix and the turn (Fig. 4). The o-helix of this motif flanks 
the central B-sheet containing the catalytic aspartates (Fig. 3). 
The C-terminal B-strand forms an antiparallel B-sheet with the 
B-strand of motif A. Structural comparisons indicate that motif 
D appears to contain a hydrophilic residue in the middle of the 
a-helix and a polar residue at the C-terminus of the o-helix 
followed by an aromatic residue at the turn (Lys783, Tyr787 
and Tyr788 in SARS-CoV; Glu341, Arg345 and Tyr346 in 
HCV; Gln345, Asp349 and Tyr350 in PV; Glu373, Asp377 
and Tyr378 in RHDV; Lys762, Glu766 and Phe767 in RV; 
Glu473, Glu477 and Tyr485 in 06; and Lys207, Arg211 and 
Trp212 in HIV-1 polymerases, respectively) (Figs 1 and 4). 
The exact functional role(s) of motif D is not yet clear. It is 
likely that motif D is involved in stabilizing the core structure 
of the catalytic domain and in helping position motif A in all 
viral RdRps, including SARS-CoV RdRp (Table 1). 


Motif E. Motif E is present only in RdRps and RTs, and its 
primary sequence is not well conserved (14) (Fig. 1). 
However, motif E has a conserved ‘B-strand, turn and 
B-strand’ structure that is part of a three-stranded antiparallel 
B-sheet in all known RdRp and RT structures (Fig. 4). It is 
located at the junction of the palm and thumb subdomains and 
is suggested to control the flexibility of the thumb during DNA 
polymerization (36,37,45). Structural comparison of all viral 
RdRps and HIV-1! RT studied in this work reveals a possible 
consensus sequence at the turn that consists of an aromatic 
residue (F/Y/W) followed by a hydrophobic residue (L/C/M) 
and a polar residue (S/K) in most RdRps and RTs (Figs | and 
4). The structural element containing motif E has been 
designated as the ‘primer grip’ in HIV-1 RT because the 
residues at the turn (Met230 and Gly231) help position the 
primer strand at the polymerase active site (37,45). Residues 
of the primer grip have also been implicated in processivity 
and fidelity of polymerization (46). The equivalent residues of 
motif E in RV polymerase (Leu782 and Lys783) and 6 
polymerase (Leu497 and Gly498) interact with the phosphate 
of the priming rNTP in the initiation complex (26,27). In the 
molecular model of SARS-CoV RdRp, motif E corresponds to 
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residues 810-820, and the residues at the turn (Cys813 and 
Ser814) help position the primer strand at the polymerase 
active site and are likely to contribute to the fidelity of 
processive polymerization (Table 1). 


Thumb subdomain 


The C-terminal portion of SARS-CoV RdRp (residues 808- 
932) contains only the thumb subdomain. The sequence of the 
thumb subdomain is less conserved in all polymerases. The 
thumb subdomain of SARS-CoV RdRp is similar in size to 
that of PV and RHDV RdRps, but considerably smaller than 
that of HCV, RV and 06 RdRps. It is likely to assume a similar 
a.-helical structure to that seen in the PV and RHDV RdRp 
structures (Figs | and 2). 

Structural studies of HIV-1 RT and other polymerases 
indicate that the thumb subdomain has great flexibility that is 
essential for nucleic acid binding and polymerization and 
appears to function as part of a translocation track during 
polymerization (36,37,45). However, due to the inflexibility of 
the nucleic acid-binding cleft in RdRps, both the fingers and 
thumb subdomains are expected to have only modest 
conformational changes upon nucleic acid binding (22,23, 
25-27). The thumb subdomain of SARS-CoV RdRp is 
predicted to have a relatively unobstructed nucleic acid- 
binding cleft that can accommodate double-stranded RNA. 
This is in contrast to HCV RdRp that has part of the nucleic 
acid-binding cleft obstructed by a B-hairpin that is proposed to 
ensure replication of the 3’ portion of the genome during 
initiation (47). 


Implications for design of anti-SARS therapeutics 


Two major classes of antiviral agents that target polymerases 
have been identified: nucleoside analog and non-nucleoside 
analog inhibitors [see review by De Clercq (48)]. There are 
several reports on inhibition of flavivirus RdRps with nucle- 
oside and non-nucleoside analog inhibitors. However, there 
are no data on inhibition of coronavirus polymerase by any 
inhibitors. We review here the available biochemical data on 
inhibition of other related RdRps, most notably HCV 
polymerase, by antiviral agents in the context of the SARS- 
CoV RdRp model and discuss their inhibitory potential for and 
interactions with SARS-CoV polymerase. 


Potential nucleoside analog inhibitors of SARS-CoV 
polymerase 


Nucleoside analogs are analogs of dNTPs or rNTPs that lack 
the 3’-OH group. These inhibitors directly compete with 
nucleotide substrates for binding to the polymerase active site 
and lead to chain termination once they are incorporated into 
the elongating chain of the nucleic acid. Nucleoside inhibitors 
have been widely used in the treatment of HIV-1, HBV, HCV 
and herpes virus infections. 

It has been reported that dNTPs lacking the 3’-OH group 
(3’-dNTPs) (with cytidine as the preferred nucleobase) can 
function as chain terminators and inhibit in vitro recombinant 
HCV polymerase (Kjs ranging from 0.6 to 25 uM). However, 
the inhibition in cell culture is considerably less efficient 
(49,50). Those results suggest that the 3’-OH of nucleotide 
analogs is not required for their incorporation; however, a 
polar group may be required at the 3’ position of the sugar ring 
to facilitate the activation of the prodrug and nucleoside 
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metabolism. Removal of the 2’-OH resulted in elimination of 
the inhibitory activity, indicating that the known chain 
terminators of DNA polymerases, dideoxynucleotides, are 
not recognized by HCV polymerase (50). Other antivirals that 
also lack a 2’-OH, such as AZT, do not inhibit RdRps (49,51), 
consistent with a requirement for a 2’-OH group. 

In the molecular model of SARS-CoV RdRp, the 2’-OH 
group of a docked canonical rNTP (and presumably of a 
nucleotide analog) interacts with Asp623 of motif A and 
Asn691 of motif B. The 3’-OH of the rNTP also forms a 
hydrogen bond with Asp623. Hence, potential nucleoside 
analog inhibitors of SARS-CoV RdRp should contain groups 
at the 2’ and 3’ positions that are capable of making hydrogen- 
bonding interactions with the neighboring Asp623 and 
Asn691. Also, analysis of the molecular model of SARS- 
CoV RdRp also suggests that potential nucleoside inhibitors 
should have the C3’ endo sugar puckering conformation to 
maintain its ability for making a hydrogen bond at the 3’ 
position and to avoid steric conflicts at the 2’ position. 

Nucleoside analogs with sugar ring modifications, 2’-C- 
methyladenosine and 2’-O-methylcytidine, are effective HCV 
polymerase inhibitors (ICs 9s ~2 and 4 uM, respectively) (52). 
The 2’-methyl group of the first inhibitor projects from the 
opposite side of the 2’-OH group of the sugar ring and interacts 
with the conserved Arg158 of motif F in the fingers subdomain 
of HCV RdRp. Similar interactions are expected with Arg553 
of SARS-CoV polymerase. The second compound has a 2’- 
methoxy group instead of the 2’-OH of rNTP. This group 
would be proximal to Asp225 of motif A and Asn291 of motif 
B in HCV polymerase. Despite some structural differences in 
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the active sites of HCV and SARS-CoV polymerases, our 
molecular model suggests that 2’-C-methyadenosine and 2’-O- 
methylcytidine may be potential inhibitors of SARS-CoV 
polymerase. 

Addition of a 5-methyl group to a pyrimidine base is 
detrimental to the potency of 3’-dNTPs as inhibitors of HCV 
polymerase (50). The 5-methy] group is in the major groove of 
the base pair between the incoming rNTP and the templating 
base. This area is proximal to the conserved elements of the 
fingers subdomain (motifs G and F). Due to structural 
variation in this area between the SARS-CoV and HCV 
polymerases, this modification of the nucleobase may have a 
different effect on the inhibition of SARS-CoV polymerase. 

Design of nucleoside analogs that are non-chain terminators 
can also be pursued as a possibility for anti-SARS therapy. 
Biochemical data show that some analogs may possess 
promiscuous base-pairing properties and, once misincorpor- 
ated, they cause errors in viral replication and induce 
mutations in the viral genome (53). Recently, it has been 
shown that ribavirin monophosphate is incorporated into viral 
genomes during RNA synthesis and causes mutations because 
of its ability to pair with both uracil and cytosine (54-56). 
Ribavirin can increase the error frequency in both PV and 
HCV replication and reduces the fitness of viruses to the point 
of extinction. 


A possible mechanism for the natural resistance of 
SARS-CoV to ribavirin 


The only nucleoside analog that has been used therapeutically 
against HCV infection is ribavirin, but its efficacy is limited by 


Figure 4. Structural comparison of HCV, PV, RHDV, RV, 06 and SARS-CoV RdRps and HIV-1 RT in the regions containing motifs D and E. (A) Ribbon 
presentation of motifs D and E in the structures of HCV, PV, RHDV, RV, 6 and HIV-1 polymerases, and in the structural model of SARS-CoV polymerase. 
(B) Superposition of the regions containing motifs D and E in different viral RdRp and RT structures [the color coding is the same as in (A)]. 
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the emergence of drug-resistant mutations. SARS-CoV has 
natural resistance to ribavirin (8). Ribavirin resistance of PV 
polymerase can be caused by a single amino acid change, 
G64S, in an unresolved portion of the fingers subdomain (56). 
Sequence alignment and structural comparison indicate that 
the structural segment containing the equivalent residue in 
HCV, RHDV, RV and 06 polymerases forms an o-helix that 
interacts with the short o-helix of motif A, suggesting that 
Gly64 of PV polymerase is located in the vicinity of the 
conserved Asp238 of motif A. This aspartate has been 
suggested to help bind the incoming nucleotide (20,26, 
27,41). The equivalent residue in HIV-1 RT (Tyrl15) has 
also been shown to affect the fidelity of the enzyme (57). 
Based on the present structural analysis, we propose that 
ribavirin resistance in PV polymerase caused by the G64S 
mutation is due to a change in the enzyme’s fidelity through 
the repositioning of the structurally conserved a-helix of motif 
A. In the molecular model of SARS-CoV polymerase, the 
three-dimensional arrangement of the corresponding struc- 
tural elements is conserved. However, there are variations in 
the interactions between the two a-helices that may modulate 
differently the enzyme’s fidelity and susceptibility to muta- 
gens such as ribavirin and account for the clinically natural 
resistance of SARS-CoV towards ribavirin. 

It is possible that additional structural elements that can 
affect the fidelity of polymerization may also contribute to the 
low susceptibility of SARS-CoV to ribavirin. Such structural 
elements may involve motifs F and G of the fingers subdomain 
that may ‘proof-read’ errors in the major groove of the nucleic 
acid. Similarly, motif C of the palm subdomain may also 
contribute to the fidelity of SARS-CoV polymerase by proof- 
reading mismatches in the minor groove, by analogy to what 
we have previously observed in HIV-1 RT (37). In addition, 
the ‘primer grip’ of motif E that is expected to control the 
positioning of the elongating primer strand may also con- 
tribute to the enzyme’s fidelity and susceptibility to ribavirin 
and other mutagens. 


Non-nucleoside inhibitors 


Non-nucleoside inhibitors of polymerases have been known to 
be effective therapeutics with great specificity against HIV-1, 
and are currently under development as anti-HCV drugs (58- 
60). In both cases, the inhibitors are hydrophobic in nature and 
act kinetically in a non-competitive manner with respect to 
dNTP or rNTP substrates, which is consistent with inhibitor 
binding at a site different from the nucleotide substrate. In 
HIV-1 RT, the inhibitors bind at a hydrophobic pocket that is 
proximal to, but distinct from the polymerase active site and is 
located at the palm—thumb subdomain interface. Binding of 
these inhibitors causes restriction on the movement of the 
thumb, conformational changes of the residues at the 
polymerase active site, and displacement of the ‘primer grip’ 
(61-65). The HCV polymerase non-nucleoside inhibitors bind 
to a hydrophobic pocket on the surface of the thumb 
subdomain and have an allosteric effect that interferes with 
the conformational change of the thumb (59). 

In the structural model of SARS-CoV RdRp, there is no 
hydrophobic pocket similar to that of HIV-1 RT near the 
polymerase active site. Besides, the thumb subdomain of 
SARS-CoV RdRp is considerably smaller than that of 
HCV RdRp and a substantial part of the non-nucleoside 


inhibitor-binding pocket of HCV does not exist in SARS-CoV 
RdRp. Thus, it is likely that the non-nucleoside inhibitors that 
can inhibit HCV or HIV-1 polymerase may not work for 
SARS-CoV _ polymerase. Nevertheless, different allosteric 
sites may exist in SARS-CoV polymerase that can be targeted 
for developing antivirals. Information on novel inhibitor- 
binding sites is likely to emerge as detailed structural data are 
available and/or new inhibitors of SARS-CoV polymerase are 
discovered through high-throughput drug screening efforts. 


Conclusion 


Although the SARS pandemic appears to be currently under 
control, the lack of effective therapeutics against a potentially 
devastating disease that could re-emerge at any time has 
triggered intensive research efforts to identify possible 
vaccine and chemotherapeutic strategies. Because of its 
pivotal role in viral replication, SARS-CoV polymerase is 
an excellent target for anti-SARS drugs. Despite substantial 
differences between the polymerases of SARS-CoV and other 
RNA viruses, we were able to build a three-dimensional 
homology model of the catalytic domain of SARS-CoV 
polymerase. In the absence of any biochemical and structural 
data on coronavirus polymerases, this model provides the first 
insights into the functional roles of conserved residues and 
motifs of this enzyme and a structural basis to evaluate 
potential interactions with inhibitors of related enzymes. This 
information should be helpful in designing anti-SARS agents 
and provide guidance for future biochemical experiments. 


Protein Data Bank accession code 


The full atomic coordinates of the catalytic domain of SARS- 
CoV polymerase have been deposited with the RCSB Protein 
Data Bank (entry 105S) for immediate release. 
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